skip to main content


Search for: All records

Creators/Authors contains: "Liu, Liyuan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available June 1, 2024
  2. We show that label noise exists in adversarial training. Such label noise is due to the mismatch between the true label distribution of adversarial examples and the label inherited from clean examples – the true label distribution is distorted by the adversarial perturbation, but is neglected by the common practice that inherits labels from clean examples. Recognizing label noise sheds insights on the prevalence of robust overfitting in adversarial training, and explains its intriguing dependence on perturbation radius and data quality. Also, our label noise perspective aligns well with our observations of the epoch-wise double descent in adversarial training. Guided by our analyses, we proposed a method to automatically calibrate the label to address the label noise and robust overfitting. Our method achieves consistent performance improvements across various models and datasets without introducing new hyper-parameters or additional tuning. 
    more » « less
  3. null (Ed.)
    Identifying and understanding quality phrases from context is a fundamental task in text mining. The most challenging part of this task arguably lies in uncommon, emerging, and domain-specific phrases. The infrequent nature of these phrases significantly hurts the performance of phrase mining methods that rely on sufficient phrase occurrences in the input corpus. Context-aware tagging models, though not restricted by frequency, heavily rely on domain experts for either massive sentence-level gold labels or handcrafted gazetteers. In this work, we propose UCPhrase, a novel unsupervised context-aware quality phrase tagger. Specifically, we induce high-quality phrase spans as silver labels from consistently co-occurring word sequences within each document. Compared with typical context-agnostic distant supervision based on existing knowledge bases (KBs), our silver labels root deeply in the input domain and context, thus having unique advantages in preserving contextual completeness and capturing emerging, out-of-KB phrases. Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names. Alternatively, we observe that the contextualized attention maps generated from a Transformer-based neural language model effectively reveal the connections between words in a surface-agnostic way. Therefore, we pair such attention maps with the silver labels to train a lightweight span prediction model, which can be applied to new input to recognize (unseen) quality phrases regardless of their surface names or frequency. Thorough experiments on various tasks and datasets, including corpus-level phrase ranking, document-level keyphrase extraction, and sentence-level phrase tagging, demonstrate the superiority of our design over state-of-the-art pre-trained, unsupervised, and distantly supervised methods. 
    more » « less
  4. Abstract The identification of the Omicron (B.1.1.529.1 or BA.1) variant of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Botswana in November 2021 1 immediately caused concern owing to the number of alterations in the spike glycoprotein that could lead to antibody evasion. We 2 and others 3–6 recently reported results confirming such a concern. Continuing surveillance of the evolution of Omicron has since revealed the rise in prevalence of two sublineages, BA.1 with an R346K alteration (BA.1+R346K, also known as BA.1.1) and B.1.1.529.2 (BA.2), with the latter containing 8 unique spike alterations and lacking 13 spike alterations found in BA.1. Here we extended our studies to include antigenic characterization of these new sublineages. Polyclonal sera from patients infected by wild-type SARS-CoV-2 or recipients of current mRNA vaccines showed a substantial loss in neutralizing activity against both BA.1+R346K and BA.2, with drops comparable to that already reported for BA.1 (refs. 2,3,5,6 ). These findings indicate that these three sublineages of Omicron are antigenically equidistant from the wild-type SARS-CoV-2 and thus similarly threaten the efficacies of current vaccines. BA.2 also exhibited marked resistance to 17 of 19 neutralizing monoclonal antibodies tested, including S309 (sotrovimab) 7 , which had retained appreciable activity against BA.1 and BA.1+R346K (refs. 2–4,6 ). This finding shows that no authorized monoclonal antibody therapy could adequately cover all sublineages of the Omicron variant, except for the recently authorized LY-CoV1404 (bebtelovimab). 
    more » « less
  5. null (Ed.)
    In the past decade, the amount of attributed network data has skyrocketed, and the problem of identifying their underlying group structures has received significant attention. By leveraging both attribute and link information, recent state-of-the-art network clustering methods have achieved significant improvements on relatively clean datasets. However, the noisy nature of real-world attributed networks has long been overlooked, which leads to degraded performance facing missing or inaccurate attributes and links. In this work, we overcome such weaknesses by marrying the strengths of clustering and embedding on attributed networks. Specifically, we propose GRACE (GRAph Clustering with Embedding propagation), to simultaneously learn network representations and identify network clusters in an end-to-end manner. It employs deep denoise autoencoders to generate robust network embeddings from node attributes, propagates the embeddings in the network to capture node interactions, and detects clusters based on the stable state of embedding propagation. To provide more insight, we further analyze GRACE in a theoretical manner and find its underlying connections with two canonical approaches for network modeling. Extensive experiments on six real-world attributed networks demonstrate the superiority of GRACE over various baselines from the state-of-the-art. Remarkably, GRACE improves the averaged performance of the strongest baseline from 0.43 to 0.52, yielding a 21% relative improvement. Controlled experiments and case studies further verify our intuitions and demonstrate the ability of GRACE to handle noisy information in real-world attributed networks. 
    more » « less
  6. null (Ed.)